A Parsing Method for Identifying Words in Mandarin Chinese Sentences
نویسندگان
چکیده
T h i s paper presents a pars ing m e t h o d for i den t i f y ing words in m a n d a r i n Chinese sentences. T h e iden t i f i cat ion system is composed of a T o m i t a ' s parser augmented w i t h tests o r i g ina l l y a pa r t o f the Engl ishChinese machine t rans la t i on system C C L E C M T together w i t h the associated augmented contextf ree g r a m m a r for w o r d compos i t i on . T h e s imple augmented g r a m m a r w i t h the score f unc t i on effectively captures the i n t u i t i v e idea of longest possible compos i t ion of Chinese words in sentences a n d , at the same t i m e , take i n to cons iderat ion the frequency counts o f words. T h e ident i f i ca t ion rate o f th is system for the corpora taken f r o m books and a newspaper is 99.6%. T h i s iden t i f i ca t ion system is s imple , bu t the iden t i f i ca t ion rate is re la t ive ly h i g h . T h e m i n i m u m element for word -compos i t i on pars ing is down to characters as opposed to sentence pars ing down to Chinese words. I t has the po ten t i a l o f incorpor a t i n g phrase s t ructures and semant ic checking in to the sys tem. In th is way, wo rd iden t i f i ca t ion , syntact ic and even semant ic analysis can be organized in to a single phase. T h e results of tes t ing the word ident i f i ca t i on on co rpora taken f r o m books and a Chinese newspaper are also presented. 1 In t roduc t ion
منابع مشابه
Word Identification For Mandarin Chinese Sentences
Chinese sentences are composed with string of characters without blanks to mark words. However the basic unit for sentence parsing and understanding is word. Therefore the first step of processing Chinese sentences is to identify the words. The difficulties of identifying words include (l) the identification of complex words, such as Determinative-Measure, redupli-cations, derived words etc., (...
متن کاملEmotional Speech Processing and Language Knowledge
How does language knowledge affect processing of paralinguistic information—vocal properties that are not directly related to understanding words? This study investigates links between a listener’s native language, any other languages they may have experience in, and the ability to identify vocal emotional information in those languages. The study focuses on two particular classes of languages:...
متن کاملDoes Prosodic-Foot Disyllabicity Hold a Default Status in Mandarin Speech Perception?
An acoustic experiment was conducted to investigate the perceptual processing of native Mandarin speakers in syntactic ambiguity resolution. The statistical majority of disyllabic words in modern Chinese [17] and the prosodic theory of Binary Foot Formation in Mandarin [3], [4], [9], [10], [11], [15] point to the stipulation that the disyllabic prosodic feet have a special 'default' status in t...
متن کاملTopics in Tone 3 Sandhi
Although Chen’s [1] Minimal Rhythmic Units analysis and Duanmu’s [2] Metrical Feet analysis can effectively account for a wide range of Tone 3 Sandhi (T3S) phenomena in Mandarin Chinese, the T3S patterns of sentences with a Topic phrase (TP sentences) were not discussed in their works. Therefore, this study intends to critically examine the power of prediction of those analyses by analyzing the...
متن کاملModeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model
In continuous speech, the pitch contour of the same syllable may vary much due to its contextual information. The Parallel Encoding and Target Approximation (PENTA) model is applied here to Mandarin speech synthesis with a method to predict pitch contours for Chinese syllables with different contexts by combining the Classification And Regression Tree (CART) with the PENTA model to improve its ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1991